Search CORE

56 research outputs found

CHIPS: Custom Hardware Instruction Processor Synthesis

Author: Can Ozturan
GÜnhan Dundar
Kubilay Atasu
Oskar Mencer
Wayne Luk
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Accuracy-guaranteed bit-width optimization

Author: Altaf Abdul Gaffar
Dong-u. Lee
George A. Constantinides
Oskar Mencer
Ray C. C. Cheung
Student Member
Wayne Luk
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1990
Field of study

Published versio

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

High performance Boson Sampling simulation via data-flow engines

Author: Kaposi Ágoston
Kolarovszki Zoltán
Kozsik Tamás
Mencer Oskar
Morse Gregory
Oszmaniec Michał
Rakyta Péter
Rybotycki Tomasz
Stojčić Uroš
Zimborás Zoltán
Publication venue
Publication date: 17/09/2023
Field of study

In this work, we generalize the Balasubramanian-Bax-Franklin-Glynn (BB/FG) permanent formula to account for row multiplicities during the permanent evaluation and reduce the complexity of permanent evaluation in scenarios where such multiplicities occur. This is achieved by incorporating n-ary Gray code ordering of the addends during the evaluation. We implemented the designed algorithm on FPGA-based data-flow engines and utilized the developed accessory to speed up boson sampling simulations up to

40

photons, by drawing samples from a

60

mode interferometer at an averaged rate of

\sim80

seconds per sample utilizing

4

FPGA chips. We also show that the performance of our BS simulator is in line with the theoretical estimation of Clifford \& Clifford \cite{clifford2020faster} providing a way to define a single parameter to characterize the performance of the BS simulator in a portable way. The developed design can be used to simulate both ideal and lossy boson sampling experiments.Comment: 25 page

arXiv.org e-Print Archive

A Selection of Recent Advances in Computer Systems

Author: Michael Flynn
Michael Flynn
Michael Flynn
Oskar Mencer
Oskar Mencer
Oskar Mencer
Publication venue
Publication date
Field of study

This paper presents a selection of recent advances in computer systems. The roadmap for CMOS technology for the next ten years shows a theoretical limit of 0.1 m for the channel of a MOSFET transistor, reached by 2007. Mainstream processors are adapting to multimedia applications with subword parallel instructions like Intel's MMX or HP's MAX instruction set extensions. Coprocessors and embedded processors are moving towards VLIW in order to save hardware costs. The memory system of the future is going to be the next generation of Rambus/RDRAM. Finally, Custom Computing Machines based on Field Programmable Gate Arrays are one of the promising future technologies for computing -- offering very high performance for highly parallelizable and pipelinable applications

CiteSeerX

Dataflow Computing for Data-Intensive Applications

Author: Mencer Oskar
Publication venue
Publication date: 01/01/2012
Field of study

CERN Document Server

Dynamic Circuit Generation for Boolean Satisfiability in an Object-Oriented Design Environment

Author: Marco Platzner
Oskar Mencer
Publication venue
Publication date: 01/01/1999
Field of study

We apply our object-oriented design environment PAM-Blox to dynamic generation of circuits for reconfigurable computing. Our approach combines the structural hardware design environment with commercial synthesis of finite state machines (FSMs). The PAM-Blox environment features a well defined hardware object interface and the ability to control the placement of hand-optimized circuits. We integrate the advantages of an object-oriented design environment with full control over placement atevery level of abstraction, with commercial FSM synthesis and optimization. As driving application we consider reconfigurable hardware accelerators for the NP-complete Boolean satisfiability problem. These accelerators require a fast compilation of circuits consisting of instance-specific datapaths and control automatons. By providing FSM optimization and control over placement, our design environment enables the maximization of performance

CiteSeerX

Parallel, Pipelined CORDICs for Reconfigurable Computing

Author: Martin Morf
Oskar Mencer
Publication venue
Publication date
Field of study

Reconfigurable computing has shown impressive successes with data intensive and latency tolerant applications. Pipelined and parallel implementations of CORDICs can achieve very high throughput for rotation, and various other functions such as multiplication, division, as well as hyperbolic and other higher order functions. Reconfiguration allows us to adapt the implementation of CORDICs and related architectures to the specific needs and properties of individual applications or specific sets of applications; hence creating application specific CORDIC implementations. Therefore it is becoming evident that CORDICs are very well suited to reconfigurable computing and custom computing machines

CiteSeerX

Application-Specific Number Representation

Author: Fu Haohuan
Luk Wayne
Mencer Oskar
Publication venue
Publication date: 01/01/2009
Field of study

Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application-specific number representations. Well-known number formats include fixed-point, floating-point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc-ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presentsa platform that enables automated exploration of the number representation design space. Thesecond part of the thesis shows case studies that optimise the designs for area, latency orthroughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: • Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which supporta wide range of bit widths and achieve significant improvement over previous designs. • Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations.EThOS - Electronic Theses Online ServiceOverseas Research Students Award Scheme and UK Engineering and Physical Sciences Research CouncilGBUnited Kingdo

OpenGrey Repository

Application of reconfigurable CORDIC architectures

Author: Luc Semeria
Martin Morf
Oskar Mencer
Publication venue
Publication date
Field of study

Very high performance architectures can be designed for data intensive and latency tolerant applications by maximizing the parallelism and pipelining at the algorithm and bit level. This is achieved by combining such technologies as reconfigurable or adaptive computing and CORDIC style arithmetic, for computing (possibly hyperbolic) rotations, multiply, divide, and related higher order functions (e.g. square-root, multidimensional rotations). Reconfiguration allows adapting the implementation of such functions to the specific needs of individual or specific sets of applications, from multi-media to radar and sonar, hence creating application specific CORDIC-style implementations. We show a high-throughput CORDIC for reconfigurable computing, a low latency CORDIC, and discuss an application to adaptive filtering (normalized ladder algorithm). 1

CiteSeerX